roachtest: add pg_dump round-trip compatibility test #167543
Conversation
|
😎 Merged successfully - details. |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link) |
spilchen
left a comment
There was a problem hiding this comment.
nice test!
@spilchen reviewed 11 files and all commit messages, and made 5 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on golgeek, mw5h, rafiss, and srosenberg).
pkg/cmd/roachtest/tests/pg_dump.go line 451 at r1 (raw file):
} } fillerDB.Close()
nit: any reason we can't use defer fillerDB.Close()?
pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):
// that pg_dump needs to handle correctly when dumping a CockroachDB // database. var pgDumpTestSchema = `
I don't see DOMAIN in this schema. Unless it's trivial, I don't think we should add one here, as this PR is already pretty good. But we may want to add to the DOMAIN epic to track. I can add one if you agree.
In general this is the thing we should update as we add new pg compatible features. Do you have any ideas of how we can remember to update this? Maybe an update to pkg/sql/CLAUDE.md or pkg/sql/schemachanger/CLAUDE.md (less certain about that one)?
pkg/cmd/roachtest/tests/pg_dump.go line 704 at r4 (raw file):
) } if err != nil {
is this err check meant as a catch for any error found when parsing the restore errors file above? If so, I don't see how err is updated.
pkg/sql/pg_catalog.go line 2505 at r4 (raw file):
} // addRowForPgIndex generates a row for pg_index for a given table and index.
nit: this comment seems out of place how. Can we move it down to be just above addRowForPgIndex
rafiss
left a comment
There was a problem hiding this comment.
@rafiss made 4 comments and resolved 2 discussions.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on golgeek, mw5h, spilchen, and srosenberg).
pkg/cmd/roachtest/tests/pg_dump.go line 451 at r1 (raw file):
Previously, spilchen wrote…
nit: any reason we can't use
defer fillerDB.Close()?
done
pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):
Previously, spilchen wrote…
I don't see DOMAIN in this schema. Unless it's trivial, I don't think we should add one here, as this PR is already pretty good. But we may want to add to the DOMAIN epic to track. I can add one if you agree.
In general this is the thing we should update as we add new pg compatible features. Do you have any ideas of how we can remember to update this? Maybe an update to
pkg/sql/CLAUDE.mdorpkg/sql/schemachanger/CLAUDE.md(less certain about that one)?
Sounds good. let's keep this PR focused. please go ahead and add a separate issue for DOMAIN. Feel free to assign it to me too; i'd like to make sure that works as well
for the reminder/documentation, I added a section to pkg/sql/schemachanger/CLAUDE.md since most new pg-compatible schema features land in the declarative schema changer
pkg/cmd/roachtest/tests/pg_dump.go line 704 at r4 (raw file):
Previously, spilchen wrote…
is this
errcheck meant as a catch for any error found when parsing the restore errors file above? If so, I don't see howerris updated.
done; i improved the comment to say that this is meant for non-fatal psql errors
pkg/sql/pg_catalog.go line 2505 at r4 (raw file):
Previously, spilchen wrote…
nit: this comment seems out of place how. Can we move it down to be just above
addRowForPgIndex
done
Add a weekly roachtest that validates pg_dump round-trip compatibility with CockroachDB. The test builds pg_dump from PostgreSQL source (REL_18_3), creates a representative schema in a source database, dumps it with `pg_dump --schema-only` using the `pg_dump_compatibility` session variable, restores into a target database via psql, dumps again, and compares the two outputs. Known deviations are tracked via an expected-diff baseline file in testdata, following the same pattern as the pg_regress roachtest. The schema exercises a broad surface area: - Enums, basic column types, default expressions, comments, and multi-schema objects. - Secondary indexes (partial, covering, expression, GIN on JSONB and text arrays, hash-sharded, vector). - Foreign keys covering every ON DELETE/UPDATE action, multi-column FKs, self-referencing FKs, and cyclic FKs added via ALTER. - Sequences with various options and OWNED BY columns. - Views (including multi-level dependencies) and materialized views. - Identity columns (BY DEFAULT and ALWAYS), generated/computed columns, composite primary keys. - User-defined functions, stored procedures, triggers, composite types, and row-level security policies. - Type variations: collated text, INET, BIT, BIT VARYING, INTERVAL with precision, parameterized TIMESTAMPTZ, NUMERIC, CHAR, and enum arrays. - Identifier quoting edge cases: reserved-keyword names, mixed-case identifiers, and identifiers near the 63-byte PG limit. In addition, the test: - Creates 3100 filler tables before the schema to push descriptor IDs into the range of PostgreSQL catalog OIDs (1247-6104), verifying that tableoid remapping in pg_dump_compatibility mode handles collisions with built-in catalog OIDs like pg_class (1259) and pg_type (1247). - Verifies the source dump contains expected patterns (CREATE TABLE, CREATE FUNCTION, COMMENT ON, etc.) to catch cases where pg_dump silently omits objects from both dumps -- a gap the round-trip diff comparison cannot detect. Patterns can be marked as known failures to log warnings without failing the test. - Extracts ERROR lines from psql's stderr into a restore_errors.txt artifact so restore failures are visible even when psql exits 0. Resolves: cockroachdb#167442 Epic: CRDB-28751 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
format_type now emits `vector(N)` when atttypmod is set, instead of dropping the dimension and returning bare `vector`. pgvector stores the dimension directly in typmod (no -4 header offset). Epic: CRDB-62553 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
PostgreSQL keeps every collation in the pg_catalog namespace and emits COLLATE clauses qualified that way. The grammar now accepts the qualified form (only when the schema part is exactly pg_catalog), strips the qualifier, and validates the trailing locale as before. Other schemas are rejected to match PostgreSQL's "collation does not exist" semantics. Epic: CRDB-62553 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
CockroachDB attaches a synthetic primary key on the hidden rowid column to every materialized view, but PostgreSQL matviews carry no constraint or index of their own. Consumers that walk pg_catalog (e.g. pg_dump) saw the synthetic PK as both a pg_constraint row and a pg_index entry and tried to recreate it. Skip pg_constraint rows entirely for materialized views (matching PG's "matviews have no constraints" behavior), and skip primary-key entries in pg_index / pg_indexes for matviews unconditionally. While here, also apply the existing pg_constraint hidden-PK filter to pg_index for ordinary tables (gated on show_primary_key_constraint_on_not_visible_columns) so the two views are consistent about what they expose. Epic: CRDB-62553 Release note (sql change): pg_catalog no longer shows the synthetic primary key constraint or its backing index for materialized views. This matches PostgreSQL, which never attaches a constraint or index to a matview implicitly. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
spilchen
left a comment
There was a problem hiding this comment.
@spilchen reviewed 2 files, made 2 comments, and resolved 1 discussion.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on golgeek, mw5h, rafiss, shghasemi, and srosenberg).
pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
Sounds good. let's keep this PR focused. please go ahead and add a separate issue for DOMAIN. Feel free to assign it to me too; i'd like to make sure that works as well
for the reminder/documentation, I added a section to pkg/sql/schemachanger/CLAUDE.md since most new pg-compatible schema features land in the declarative schema changer
I added #170830. Thanks.
|
/trunk merge TFTR! |
Add a nightly roachtest that validates pg_dump round-trip compatibility
with CockroachDB. The test builds pg_dump from PostgreSQL source
(REL_18_3), creates a representative schema in a source database,
dumps it with
pg_dump --schema-onlyusing thepg_dump_compatibilitysession variable, restores into a target database via psql, dumps
again, and compares the two outputs.
Known deviations are tracked via an expected-diff baseline file in
testdata, following the same pattern as the pg_regress roachtest.
The schema exercises a broad surface area:
multi-schema objects.
text arrays, hash-sharded, vector).
FKs, self-referencing FKs, and cyclic FKs added via ALTER.
columns, composite primary keys.
types, and row-level security policies.
with precision, parameterized TIMESTAMPTZ, NUMERIC, CHAR, and
enum arrays.
identifiers, and identifiers near the 63-byte PG limit.
In addition, the test:
into the range of PostgreSQL catalog OIDs (1247-6104), verifying
that tableoid remapping in pg_dump_compatibility mode handles
collisions with built-in catalog OIDs like pg_class (1259) and
pg_type (1247).
CREATE FUNCTION, COMMENT ON, etc.) to catch cases where pg_dump
silently omits objects from both dumps -- a gap the round-trip diff
comparison cannot detect. Patterns can be marked as known failures
to log warnings without failing the test.
artifact so restore failures are visible even when psql exits 0.
This PR also has 3 commits that provide minor compatibility fixes to make pg_dump work:
Resolves: #167442
Epic: CRDB-28751
Release note: None